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Computer control providing single-cycle branching. 

An instruction processor suitable for use in a reduced in- 
struction-set computer employing an instruction pipeline 
which performs conditional branching in a single processor 
cycle. The processor treats a branch condition as a normal 
instruction operand rather than a special case within a sep- 
arate condition code register. The condition bit and the branch 
target address determine which instruction is to be fetched. 
^ the branch not taking effect until the next-following instruc- 
^ tion is executed. In this manner, no replacement of the instruc- 
tion which physically follows the branch instruction in the pip- 
pi} eline need be made, and the branch occurs within the single 
^ cycle of the pipeline allocated to it. A simple circuit imple- 
^ merits this delayed branch method. A computer incorporat- 
ing the processor readily executes special-handling tech- 
3) niques for calls on subroutines, interrupts and traps. 
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DIGITAL INSTRUCTION PROCESSOR CONTROL 



5 This invention relates to method and apparatus for 

processing instructions for a digital computer, and more 
particularly, for processing branch instructions in a pipeline 
using only the single cycle allocated in the pipeline to the 
instruction without need for branch prediction or complex 
10 circuitry. 

BACKGROUND OF THE INVENTION 
Reduced instruction set computers (RISC) recognize the 
advantages of using simple decoding and the pipelined execution 
of instructions. Branch instructions are required in a computer 

15 to control the flow of instructions. A branch instruction in a 
pipelined computer will normally delay the pipeline until the 
instruction at the location to which the branch instruction 
transferred control, the "branch address", is fetched. As such, 
these instructions impede the normal pipelined flow of 

20 instructions. Known in the prior art are elaborate techniques 
which delay the effect of branches, "delayed branching", or 
predicting branches ahead of time and correcting for wrong 
predictions, or fetching multiple instructions until the 
direction of the branch is known. 

25 Since most of these techniques are too complex for a RISC 

architecture, the delayed branch is chosen for it; the delayed 
branch allows RISCs to always fetch the (physically) next 
instruction during the execution of the current instruction. 
As most RISCs employ pipelining of instructions, in the prior 

30 art delayed branching requires two instruction processor clock 
cycles to execute a branch instruction. This disrupts the 
instruction pipeline. Complex circuitry was introduced into the 
prior art to eliminate such disruption. Since branch 
instructions occur frequently within the instruction stream, 

35 prior art computers were slower and more complex than desired. 
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those skilled in the art how to effect control of the various 
illustrated elements. 

With reference to Fig. 1, processor 10 includes a branch 
target address (BRN TGT) multiplexer/register 12 which receives 

5 from a general-purpose register file, not shown in Fig. 1, via a 
data path 14 an address which contains the location to which a 
branch is to be made by processor 10, the so-called "indirect 
branch address". A second type of branch address, the so-called 
"relative or absolute branch address" is determined by an adder 

10 16 which at a first input receives from the instruction register 
a branch displacement value via a data path 18. This value can 
be added to the address of the presently-executed instruction 
received at a second input to adder 16 via a data path 20, 
resulting in the relative branch address. Should an addition not 

15 be performed, only the branch displacement address will be 
generated by the adder 16, resulting in the absolute branch 
address. The address generated by adder 16 is conducted via a" 
data path 22 to a second input of BRN TGT register/multiplexer 
12. 

20 The branch target address selected by BRN TGT 

register /multiplexer 12 as determined by a control signal 
generated by processor 10 in accordance with the branch 
instruction executed by processor 10 is generated at an output 
thereof and conducted via data path 24 to a first input of a 

25 multiplexer (MUX) 26 which has an output terminal connected via a 
data path 28 to an input terminal of an instruction cache 30. 
The instruction cache 30 contains a set of storage locations, 512 
in the preferred embodiment, for storing sets of contiguous 
instructions which constitute a portion of the program being 

30 currently executed by processor 10. Application of an address at 
the input terminal of cache 30 causes the instruction stored at 
that address to be conducted to the instruction register of 
processor 10 to become the next instruction to be executed 
thereby. 

35 Also receiving the address generated by HUX 26 is a program 
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counter (PC) stack 32 comprising a decode PC register 34 to which 
is conducted via data path 28 the address generated by HUX 26, an 
execute PC register 36 to which is conducted via a data path 38 
the contents of the decode PC register 34, and a store PC 
register 40 to which is conducted via a data path 42 the contents 
of the execute PC register 36. The PC stack 32 implements a 
four-stage instruction address pipeline, as will be described 
below in connection with Fig. 2. 

Also receiving the address generated by HUX 26 is an address 
incrementer (+1) 44 which generates at an output the address 
applied to it via data path 28 incremented by 1, i.e., the 
continue address. A program counter (PROG CNT) register 46 
receives the continue address generated by incrementer 44 via a 
data path 48 and stores this address. The continue address is 
generated at an output of the PROG CNT register 46 and conducted 
via data path 20 to the second input of adder 16 and a second 
input of MUX 26. 

The HUX 26 receives a control signal based on the branch 
condition, described above, to determine whether the branch 
target address applied on data path 24 or the continue address 
applied on data path 20 will be applied to the instruction cache 
30 to fetch the next instruction to be executed by processor 10, 
An instruction calling for a branch will be processed by 
processor 10 so that the branch does not occur until the 
instruction following the branch instruction is executed. In 
this manner, the instruction pipeline, implemented by the PC 
stack pipeline 32, operates without interruption even when a 
branch instruction enters the pipeline, since no replacement of 
the instruction which normally follows the branch instruction 
need be made in the pipeline. Accordingly, the branch occurs 
within the single cycle of the pipeline allocated to it, as will 
be described in connection with Fig. 2. 

A four-stage pipeline is used by the processor 10 of the 
instant invention; an instruction fetch stage, an instruction 
decode stage, an instruction execution stage, and a data storage 
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stage. The various stages of the instruction pipeline employed 
by processor 10 are shown in Fig. 2, illustrating the execution 
of a branch instruction. 

By employing a so-called "delayed branching" technique, the 

5 processor 10 can effect execution of a branch instruction in a 
single processor cycle without requiring complex logic 
circuitry. The space between the vertical dashed lines in 
Fig. 2 corresponds to a single processor cycle, each cycle having 
an equal duration. Shown extending from to to ti during the 

10 first cycle of the processor 10 is "BRANCH 1" in which a branch 
instruction is fetched from instruction cache 30 and stored in 
the instruction register of processor 10. Shown extending from 
ti to t= during the second cycle of the processor 10 "BRANCH 2" 
is the decoding of the branch instruction stored in the 

15 instruction register. The branch condition needed by the 
instruction is retrieved from the general-purpose register 
described above and the branch target address specified by the 
instruction is determined, as described above, and conducted on 
data bus 24 to the first input of MUX 26. 

20 During the execution cycle of the branch instruction 

extending from ta to t3 "BRANCH 3", the condition causes 
processor 10 to generate a control signal which, in turn, causes 
MUX 26 to select either the branch target address or the continue 
address to be conducted to the instruction cache 30 for use in 

25 fetching the next (logical) instruction. During the storeback 
cycle, extending from ta to t*, the instruction (physically) 
following the branch instruction is in the execute stage of 
processor 10. 

As shown in Fig. 2, a branch delay instruction "DELAY" 
30 physically follows the branch instruction and is always executed; 
the branch itself not occurring until after the branch delay 
instruction, whereupon the instruction to which the branch 
instruction passes control "TARGET" executes. In this manner, 
the processor 10 can always fetch the next instruction during the 
35 execution of the current instruction, i.e., operate in a pipeline 
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node without the need to interrupt the pipeline nor retract the 
fetching of an instruction. Accordingly, Fig. 2 indicates that 
during the second cycle "DELAY 1" processor 10 fetches from cache 
30 the instruction physically following the branch instruction 
5 whose fetching, decoding, execution and storeback cycles were 
just described. This instruction thus occupies the stage in the 
pipeline immediately following the branch instruction which 
preceded it. Hence, the decoding, execution and storeback cycles 
for the branch delay instruction "DELAY 2'\ "DELAY 3" and 
10 "DELAY 4" will occur during the third, fourth and fifth cycles of 
processor 10 as shown in Fig. 2. 

The instruction to which control passes by virtue of the 
branch instruction, "TARGET", will occupy the stage in the 
pipeline immediately following the delay instruction, as shown in 
15 Fig. 2. Thus the fetching, decoding, execution and storeback 
cycles for the TARGET instruction "TARGET 1", "TARGET 2", 
"TARGET 3", and "TARGET 4", will occur during the fourth, fifth 
and sixth cycles of processor 10 as shown in Fig. 2. The serial 
connection of the decode PC register 34, the execute PC register 
20 36 and the store PC register 40, clocked at the intervals to, ti, 
. . . , t* implement the pipeline described above by storing the 
addresses of instructions associated with the corresponding 
pipeline stages. 

The processor 10 of the instant invention can execute a 
25 branch which is called for by a call subroutine instruction by a 
modification of the delayed branching technique described above 
in connection with Fig. 2. The call subroutine instruction is 
fetched from the cache 30 and stored in the instruction register 
during the first cycle of processor 10 extending from to to ti ; 
30 denoted as "BRANCH 1" in Fig. 2. During the second cycle of the 
processor 10, the call subroutine instruction, "BRANCH 2", is 
decoded and the contents of the PROG CNT register 46 is generated 
on data path 20, and via HUX 26, onto data path 28 for entry in- 
to a data path pipeline, not shown in Fig. 1. During the 
35 third cycle of the processor 10 "BRANCH 3", the contents of the 
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PROG CNT register are increased by four in an arithmetic logic 
unit of the processor 10, not shown in Fig. 2, to establish a 
return address from the subroutine. During the fourth cycle of 
the processor 10 "BRANCH 4", the return address is saved in a 
5 general-purpose register. In all other respects, the technique 
for executing a call subroutine instruction is as described above 
in connection with Fig. 2 for executing a branch instruction, 
where the instruction to which control passes by virtue of the 
call subroutine instruction, "TARGET", will be the first 
10 instruction of the subroutine. 

As the processor 10 of the present invention is capable of 
servicing interrupts and traps, special consideration must be 
given to the occurrence of an interrupt or trap between a branch 
or a call subroutine instruction and the delayed-branch 
15 instruction which follows it. In this case, the processor 10 
must cause the delayed-branch instruction to be executed after 
return from the interrupt or trap routine, in addition to the 
target instruction to which control passes by virtue of the 
branch or call instruction. To assure this result when returning 
20 from an interrupt or trap routine, processor 10 must execute two 
branches: a first branch causes execution of the branch delay 
instruction which was pre-empted by the occurrence of the 
interrupt or trap, and a second branch which causes execution of 
the target instruction which followed the delayed-branch 
25 instruction. 

The various stages of the instruction pipeline employed by 
processor 10 to effect execution of an interrupt or trap routine 
occurring between a branch or subroutine call instruction and the 
delayed-branch instruction which follows it are illustrated in 
30 Fig- 3A. The pipeline stages employed by processor 10 to effect 
return from the interrupt or trap routine are illustrated in 
Fig. 3B. With reference to Fig. 3A, the operation of processor 
10 is illustrated by an interrupt occurring at time u . 
Hodif ications to the latter procedure will not be described 
35 herein as they can be provided by those skilled in the art. For 
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purposes of illustration, a shift instruction is shown as fetched 
from instruction cache 30 in the preceding cycle extending from 
to to ti . The pipeline also contains, for purposes of 
illustration, a jump instruction, followed by an add instruction, 

5 followed by the shift instruction. Since the jump instruction 
was executed just before occurrence of the interrupt and the add 
and shift instructions had yet to be executed, it will be 
necessary for processor 10 to return from the interrupt routine 
and then execute the add and shift instructions. The addresses 

10 of these instructions must be saved before transfer to the 

interrupt routine. Accordingly, a "SAVE_PC_JU11P'' instruction is 
indicated in Fig. 3A as fetched during the cycle extending from 
ti to t=, following occurrence of the interrupt. This will cause 
processor 10 to save the address of the branch delay instruction, 

15 namely the add instruction which was to execute during the cycle 
extending from tl to ta following occurrence of the interrupt. 
The contents of the execute PC register 36 (Fig. 1) portion of 
the PC stack 32 will accordingly be saved. Also, the "SAVE_PC 
JUMP" instruction will cause processor 10 to generate the 

20 contents of the BRN TGT multiplexer/register 12 onto data path 
24, and via MUX 26, onto data path 28 and therefrom to 
instruction cache 30. These contents being the address of the 
first instruction of the interrupt routine. As shown in Fig. 3A, 
processor 10 will fetch during the cycle extending from t* to t 3 

25 a "SAVEJPC" instruction, which will cause processor 10 to save 
the addresss of the target instruction, namely, the shift 
instruction, which would have normally followed the add 
instuction. The first instruction of the interrupt routine, 
designated the "INTERRUPT HANDLER" in Fig. 3A, will then be 

30 fetched by processor 10 during the cycle extending from ta to t* . 
The decoding and execution of the shift and add 
instructions, respectively, are accordingly aborted as indicated 
in Fig. 3A by the designations "(ADD)" and "(SHIFT)" during the 
decoding and execution stages. During subsequent store back 

35 stages, the "SAVE_PC„JUMP" and "SAVE_PC" instructions cause 
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processor 10 to save the addresses of the add instruction and the 
shift instruction, as indicated in Fig. 3A. 

With reference to Fig. 3B, the interrupt routine will 
complete by causing the processor 10 to fetch two jump indirect 
5 instructions, which are shown in Fig. 3B as being decoded during 

the cycles extending from t°' to t* * and t*' to ta\ To return 
from the interrupt routine then, processor 10 will perform an 
indirect jump via the value saved by the "SAVEJPC" instruction 
described in connection with Fig. 3A and will 

10 accordingly fetch the add instruction from cache 30 during the 

cycle extending from t* * to ta * and will perform an indirect jump 
via the value saved by the "SAVEJPC_JUHP" instruction and will 
accordingly fetch the shift instruction from cache 30 during the 
cycle extending from ta' to ta' as shown in Fig. 3B. Thus, the 

15 processor 10 will execute these instructions in their order of 
occurrence in the pipeline just prior to the occurrence of the 
interrupt. 
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CLAIMS 

I 1. A digital instruction processor control which cyclically 
executes, in a single cycle, instructions from a set, including a 

3 plurality of plural-bit branch instructions, stored in an 

instruction cache having a plurality of locations each with a 
5 designator, said processor control comprising: 

means for generating signals indicative of a "continue with 
7 next instruction" address: 

means responsive to said continue address signals and to 
9 predetermined bit portions of said branch instructions for 

generating signals indicative of a "branch target' 1 address; and 

II first multiplexer means having an output terminal connected 
to said instruction cache responsive to a control signal 

13 indicative of the contents of a predetermined "condition" bit 

portion of said branch instruction, to said branch target address 
15 signals applied to a first input terminal thereof and said 

continue address signals applied to a second input terminal 
17 thereof for selectively conducting to said output terminal one of 

said address signals indicative of a location within said 
19 instruction cache from which to fetch the next instruction to be 

processed by said instruction processor. 

I 2. A digital instruction processor control according to 
claim 1 wherein said branch target address generating means 

3 comprises: 

second multiplexer/register means having an output terminal 
5 connected to said first input terminal of said first multiplexer 

means responsive to a control signal indicative of which of said 
7 plurality of branch instructions is being executed, to signals 

applied to a first input terminal indicative of an "indirect 
9 address" determined by said branch instructions, and to signals 

applied to a second input terminal indicative of a "relative or 

II absolute branch address" for selectively conducting to said 
output terminal one of said address signals indicative of said 

13 branch target address; and 
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adder means having an output terminal connected to said 
15 second input terminal of said second multiplexer/register means 

responsive to said control signal indicative of which of said 
17 plurality of branch instructions is being executed, to signals 

applied to a first input terminal indicative of a predetermined 
19 "branch displacement" portion of said branch instructions, and to 

said continue address signals applied to a second input terminal 
21 for selectively generating at said output terminal an 

arithmetical combination of said signals applied to said first 
23 and second input terminals. 

1 3. A digital instruction processor control according to 

claim 1 further including means connected to said output terminal 
3 of said first multiplexer means for storing at least three 

signals indicative of instruction cache location designators, 
5 each instruction therein designated occupying a stage in an 

instruction "pipeline", for generating signals representative of 
7 said contents stored therein, and for updating the contents of 

said location designators stored therein so that said instruction 
9 cache location designator conducted by said first multiplexer 

during the preceding cycle of said instruction processor replaces 
11 the contents of a first storage location thereof, the instruction 

cache location designator stored in said first storage location 
13 replaces the contents of a second storage location, and the 

instruction cache location designator stored in said second 
15 storage location replaces the contents of a third storage 

location. 

1 4. A digital instruction processor according to claim 3 

wherein said pipeline means comprises a first clocked register 

3 having an input terminal connected to said output terminal of 
said first multiplexer means and an output terminal, a second 

5 clocked register having an input terminal connected to said 
output terminal of said first register and an output terminal, 

7 and a third register having an input terminal connected to said 
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output terminal of said second register. 
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I 5. A method of performing branches in one cycle of a 
digital instruction processor control having a program counter 

3 which cyclically executes instructions from a set, including a 

plurality of branch instructions each determining a "branch 
5 condition", stored in an instruction cache having a plurality of 

locations each with a designator, comprising the steps of: 
7 a) fetching from said cache at the location designator 

specified by the contents of said program counter a branch 
9 instruction and storing said instruction in an instruction 

register; 

II b) decoding said instruction stored in said instruction 
register; 

13 c) saving said branch condition determined by said 

instruction; 

15 d) determining a branch target address based on the 

information generated at decoding step (c), and the contents of 
17 said program counter; 

e) fetching an instruction from a location in said cache 
19 determined from said branch target address determined at step 

(d), and said branch condition information generated at step (c); 
21 and 

f ) replacing the contents of the program counter with the 

23 address used to fetch said instruction at step (e) incremented by 
one. 

1 6. A one-cycle branching method according to claim 5 

further including a method for calling a procedure in one cycle 

3 wherein said instruction set further includes a procedure call 
instruction, wherein step (a) calls for fetching said procedure 

5 call instruction, said method further including the steps of: 

g) determining a call return address based on the 

7 information generated at decoding step (b) and the contents of 
said program counter; and 
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9 h) saving said call return address determined at step (g). 

I 7. A method of processing an interrupt routine by a digital 
instruction processor control having an instruction pipeline 

3 which cyclically executes instructions from a set stored in an 
instruction cache having a plurality of locations each with a 

5 designator, comprising the steps of: 

a) saving the location designator of the instruction placed 

7 in the pipeline two cycles prior to the occurrence of the 
interrupt; 

9 b) saving the location designator of the instruction placed 

in the pipeline one cycle prior to the occurrence of the 

II interrupt; 

c) transferring control to the first instruction of said 
13 interrupt routine; 

d) prior to returning from said interrupt routine fetching 
15 for said pipeline the instruction located at the location 

designator saved at step (a); and 
17 e) prior to returning from said interrupt routine fetching 

for said pipeline the instruction located at the location 
19 designator saved at step (b). 

1 8. An interrupt processing method according to claim 7 

wherein said instruction processor control has a program counter 
3 and a data path pipeline, said instruction set includes an 

interrupt procedure call instruction, and wherein transferring 
5 control step (c) comprises the steps of: 

cl) fetching from said cache at the location designator 
7 specified by the contents of said program counter said interrupt 

procedure call instruction and storing said instruction in an 
9 instruction register; 

c2) decoding said instruction stored in said instruction 
11 register; 

c3) determining a branch target address based on the 
13 information generated at decoding step (c2), and the contents of 
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said program counter; 
15 c4) fetching an instruction from a location in said cache 

determined from said branch target address determined at step 
17 (c3); 

c5) replacing the contents of the program counter with the 
19 address used to fetch said instruction at step (c4) incremented 
by four; and 

21 c6) conducting the contents of said program counter to said 

data path pipeline for use in subsequent storage operations. 

I 9. An interrupt processing method according to claim 7, 
further including the steps of: 

3 f ) following return from said interrupt routine fetching 

from said cache at the location designator saved at step (a) and 

5 storing said instruction in an instruction register; and 

g) following return from said interrupt routine fetching 

7 from said cache at the location designator saved at step (b) and 
storing said instruction in an instruction register. 

1 10. An interrupt processing method according to claim 9, 

wherein said instruction processor control has a program counter, 

3 said instruction set includes an interrupt procedure call 
instruction, and wherein fetching step (f) comprises 

5 the steps of: 

fl) fetching from said cache at the location designator 

7 specified by the contents of said program counter said indirect 
branch instruction and storing said instruction in an 

9 instruction register; 

f2) decoding said instruction stored in said instruction 

II register; 

f3) determining an indirect branch address based on the 
13 information generated at decoding step (f2), and the contents of 

said program counter; 
IS f4) fetching an instruction from a location in said cache 

determined from said indirect branch address determined at step 
17 (f3); and 

cS> replacing the contents of the program counter with the 
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address used to fetch said instruction at step (f4) incremented 
by one. 
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